Fast latent semantic indexing of spoken documents by using self-organizing maps
نویسنده
چکیده
منابع مشابه
Indexing Audio Documents by using Latent Semantic Analysis and SOM
This paper describes an important application for state-of-art automatic speech recognition , natural language processing and information retrieval systems. Methods for enhancing the indexing of spoken documents by using latent semantic analysis and self-organizing maps are presented, motivated and tested. The idea is to extract extra information from the structure of the document collection an...
متن کاملLatent Semantic Indexing by Self-organizing Map
An important problem for the information retrieval from spoken documents is how to extract those relevant documents which are poorly decoded by the speech recognizer. In this paper we propose a stochastic index for the documents based on the Latent Semantic Analysis (LSA) of the decoded document contents. The original LSA approach uses Singular Value Decomposition to reduce the dimensionality o...
متن کاملThematic indexing of spoken documents by using self-organizing maps
A method is presented to provide a useful searchable index for spoken audio documents. The task diiers from the traditional (text) document indexing, because large audio databases are decoded by automatic speech recognition and decoding errors occur frequently. The idea in this paper is to take advantage of the large size of the database and select the best index terms for each document with th...
متن کاملBig Data Categorization for Arabic Text Using Latent Semantic Indexing and Clustering
Documents categorization is an important field in the area of natural language processing. In this paper, we propose using Latent Semantic Indexing (LSI), singular value decomposing (SVD) method, and clustering techniques to group similar unlabeled document into pre-specified number of topics. The generated groups are then categorized using a suitable label. For clustering, we used Expectation–...
متن کاملEmergence of Linguistic Representations by Independent Component Analysis
Our aim is to find syntactic and semantic relationships and roles of words based on the analysis of corpora. We study three methods for analyzing words in contexts as potential methods for solving this task. The methods are latent semantic analysis, self-organizing map and independent component analysis. Latent semantic analysis is a simple method for automatic generation of concepts that are u...
متن کامل